Merging Data Frames

Let's learn how to merge Data Frames together (you'll use this in your Final Data Frame Project!)

In [3]:
## use character columns of names to get sensible sort order
authors <- data.frame(
    surname = I(c("Tukey", "Venables", "Tierney", "Ripley", "McNeil")),
    nationality = c("US", "Australia", "US", "UK", "Australia"),
    deceased = c("yes", rep("no", 4)))
In [4]:
books <- data.frame(
    name = I(c("Tukey", "Venables", "Tierney",
             "Ripley", "Ripley", "McNeil", "R Core")),
    title = c("Exploratory Data Analysis",
              "Modern Applied Statistics ...",
              "LISP-STAT",
              "Spatial Statistics", "Stochastic Simulation",
              "Interactive Data Analysis",
              "An Introduction to R"),
    other.author = c(NA, "Ripley", NA, NA, NA, NA,
                     "Venables & Smith"))
In [6]:
authors
Out[6]:
surnamenationalitydeceased
1TukeyUSyes
2VenablesAustraliano
3TierneyUSno
4RipleyUKno
5McNeilAustraliano
In [ ]:
In [ ]:
(m1 <- merge(authors, books, by.x = "surname", by.y = "name"))
In [1]:
(m2 <- merge(books, authors, by.x = "name", by.y = "surname"))
stopifnot(as.character(m1[, 1]) == as.character(m2[, 1]),
          all.equal(m1[, -1], m2[, -1][ names(m1)[-1] ]),
          dim(merge(m1, m2, by = integer(0))) == c(36, 10))

## "R core" is missing from authors and appears only here :
merge(authors, books, by.x = "surname", by.y = "name", all = TRUE)

## example of using 'incomparables'
x <- data.frame(k1 = c(NA,NA,3,4,5), k2 = c(1,NA,NA,4,5), data = 1:5)
y <- data.frame(k1 = c(NA,2,NA,4,5), k2 = c(NA,NA,3,4,5), data = 1:5)
merge(x, y, by = c("k1","k2")) # NA's match
merge(x, y, by = "k1") # NA's match, so 6 rows
merge(x, y, by = "k2", incomparables = NA) # 2 rows
Out[1]:
surnamenationalitydeceasedtitleother.author
1McNeilAustralianoInteractive Data AnalysisNA
2RipleyUKnoSpatial StatisticsNA
3RipleyUKnoStochastic SimulationNA
4TierneyUSnoLISP-STATNA
5TukeyUSyesExploratory Data AnalysisNA
6VenablesAustralianoModern Applied Statistics ...Ripley
Out[1]:
nametitleother.authornationalitydeceased
1McNeilInteractive Data AnalysisNAAustraliano
2RipleySpatial StatisticsNAUKno
3RipleyStochastic SimulationNAUKno
4TierneyLISP-STATNAUSno
5TukeyExploratory Data AnalysisNAUSyes
6VenablesModern Applied Statistics ...RipleyAustraliano
Out[1]:
surnamenationalitydeceasedtitleother.author
1McNeilAustralianoInteractive Data AnalysisNA
2R CoreNANAAn Introduction to RVenables & Smith
3RipleyUKnoSpatial StatisticsNA
4RipleyUKnoStochastic SimulationNA
5TierneyUSnoLISP-STATNA
6TukeyUSyesExploratory Data AnalysisNA
7VenablesAustralianoModern Applied Statistics ...Ripley
Out[1]:
k1k2data.xdata.y
14444
25555
3NANA21
Out[1]:
k1k2.xdata.xk2.ydata.y
144444
255555
3NA11NA1
4NA1133
5NANA2NA1
6NANA233
Out[1]:
k2k1.xdata.xk1.ydata.y
144444
255555
In [ ]: